Journal of Chemical Information and Modeling
● American Chemical Society (ACS)
Preprints posted in the last 30 days, ranked by how well they match Journal of Chemical Information and Modeling's content profile, based on 207 papers previously published here. The average preprint has a 0.21% match score for this journal, so anything above that is already an above-average fit.
Secker, C.; Secker, P.; Yergoez, F.; Celik, M. O.; Chewle, S.; Phuong Nga Le, M.; Masoud, M.; Christgau, S.; Weber, M.; Gorgulla, C.; Nigam, A.; Pollice, R.; Schuette, C.; Fackeldey, K.
Show abstract
The identification of suitable lead molecules in the vast chemical space is a critical and challenging task in drug discovery campaigns. Recently, it has been demonstrated that large-scale virtual screening provides a powerful approach to accelerate the identification of novel drug candidates by screening ever increasing virtual ligand libraries, which have reached magnitudes of > 1020 compounds. However, this desirable increase in potentially bioactive molecules poses a new challenge as enumerating and virtually screening such huge compound libraries is computationally prohibitive. Consequently, advanced approaches to navigate ultra-large chemical spaces and to identify suitable candidate molecules therein are urgently needed. Here, we present an evolutionary algorithm framework using molecular generative AI, reaction-based substructure searching, and iterative model fine-tuning for a targeted and efficient exploration of chemical fragment spaces. Combining this approach with large-scale virtual screening we are able to identify target-specific candidate molecules within the commercially available Enamine REAL Space ([~]1015). We demonstrate the applicability of the approach by successfully identifying and biochemically validating pH-specific ligands of the {micro}-opioid receptor. Our results demonstrate that integrating generative AI with evolutionary algorithms provides a promising route to explore ultra-large chemical spaces for the discovery of novel, synthetically accessible lead molecules.
Chowdhury, T. D.; Shafoyat, M. U.; Hemel, N. H.; Nizam, D.; Sajib, J. H.; Toha, T. I.; Nyeem, T. A.; Farzana, M.; Haque, S. R.; Hasan, M.; Siddiquee, K. N. e. A.; Mannoor, K.
Show abstract
Alzheimers disease remains a major therapeutic challenge, and no {beta}-secretase (BACE1) inhibitor has achieved clinical approval. A key limitation of prior discovery efforts is reliance on single-parameter optimization, often resulting in candidates with limited translational potential. In this study, we developed a biology-informed computational framework integrating meta-ensemble QSAR modeling, molecular docking, Protein Language Model (ESM-1b)-guided residue interaction weighting, and ADMET profiling within a normalized multi-parameter ranking scheme. Model performance was validated using cross-validation, external validation, and Y-randomization (n = 100; p = 0.009), while applicability domain analysis based on Tanimoto similarity highlighted reduced reliability for extrapolative predictions. Sensitivity analysis showed high ranking stability under moderate perturbations (Spearman {rho} = 0.998 for {+/-}10%; 0.963 for {+/-}25%), with reduced agreement under randomized weighting ({rho} = 0.821), indicating that prioritization is robust but influenced by weight selection. Screening of 16,196 compounds identified 153 predicted actives (accuracy = 0.852; ROC-AUC = 0.920), which were refined to 111 candidates and seven prioritized leads. Molecular dynamics simulations (200 ns) indicated stable binding and persistent catalytic interactions, with Mol-2 showing favorable dynamic stability and ADMET characteristics. Overall, this study presents an interpretable and quantitatively evaluated framework for multi-parameter compound prioritization, supporting more reliable virtual screening in early-stage CNS drug discovery.
Nada, H.; Sipos-Szabo, L.; Bajusz, D.; Keseru, G.; Gabr, M.
Show abstract
Despite advances in computational drug discovery, de novo drug design remains hindered by high licensing costs and the need for specialized programming expertise. We present LigandForge, a webserver for structure-guided de novo ligand generation. LigandForge integrates structural validation and binding-site characterization; voxel-based property grid construction for spatial mapping of electrostatics and hydrophobicity; chemistry-aware fragment assembly; multi-objective lead optimization; and retrosynthetic feasibility analysis. The platform utilizes a structure-guided framework to assemble molecules from curated fragment libraries while enforcing physicochemical constraints, including molecular weight, LogP, and hybridization states. Generated molecules are refined via reinforcement learning and genetic algorithms which are subsequently evaluated using composite metrics such as the quantitative estimate of drug-likeness. By leveraging RDKit for cheminformatics and NGL viewer for real-time 3D visualization, LigandForge provides a synthesis-aware environment that bridges the gap between macromolecular structural data and experimentally feasible lead compounds without requiring local software installation.
Poelmans, R.; Bruncsics, B.; Arany, A.; Van Eynde, W.; Shemy, A.; Moreau, Y.; Voet, A. R.
Show abstract
Knowledge-based potentials (KBPs) have long been used to score protein-ligand interactions, yet existing formulations remain isotropic, capturing only distance dependencies and neglecting the directional preferences that govern molecular recognition. Here, we introduce Direction-Enhanced Scoring POTentials (DESPOT), an anisotropic knowledge-based framework that unifies pose scoring and binding-site characterisation within a single probabilistic model. The new probabilistic formulation used in DESPOT naturally supports directional modelling through atom type-specific local reference frames and symmetry-aware geometric discretisation. It also supports steric exclusion, encoded as a dedicated void state that explicitly captures the probability that a spatial bin remains unoccupied. The anisotropic interaction profiles learned by DESPOT reveal systematic directional preferences for interactions such as hydrogen bonds, aromatic interactions, and halogen bonds, that extend beyond idealised geometric models. Evaluation on the CASF-2016 benchmark shows that DESPOT sub-stantially outperforms isotropic KBPs in all pose-discrimination and virtual screening tasks (p << 0.0001 for all enrichment factors), with the largest gains arising from its ability to penalise geometrically implausible poses. Constrained energy minimisation of training structures proves strongly beneficial for the derivation of KBPs, while our train-test leakage analysis reveals that overfitting is an underestimated and understudied issue for KBPs. DESPOT provides a data-driven framework for direction-aware modelling of protein-ligand interactions, with applications in pose scoring, binding-site characterisation, and structure-based design.
Shi, Z.; Gao, X.; Xu, M.; Zhu, X.; Wang, P.; Yang, Y.; Yang, Z.; Zhou, R.
Show abstract
Despite rapid progress in AI agents for computer-aided drug design (CADD), protein-ligand simulation workflows remain fragmented across disparate tools, creating a major bottleneck for scalable candidate evaluation. Here, we present PRISM (Protein-Receptor Interaction Simulation Modeler), a Python platform built on GROMACS that unifies ligand parameterization across multiple force fields, automated system construction, enhanced sampling, multi-tier binding free energy estimation, and trajectory analysis within a single workflow. Through the Model Context Protocol (MCP), PRISM further serves as the computational infrastructure for CADD-Agent, an expert-workflow-driven AI agent designed to orchestrate hierarchical drug screening pipelines. As a pilot application, we applied PRISM to riboflavin synthase and demonstrated end-to-end automation from candidate library assembly to binding pocket characterization, identifying a potential allosteric inhibition site at the oligomerization interface. Together, these results establish PRISM as a high-throughput simulation infrastructure for agent-enabled CADD.
Jimenez Garcia, J. C.; Lopez-Gallego, F.; Lopez, X.; De Sancho, D.
Show abstract
The rational design of biomolecule immobilization strategies requires molecular-level understanding of how surface properties, tethering geometry, and structural dynamics jointly influence stability and function. Recently, coarse-grained molecular dynamics simulations based on the Martini force field have emerged as an efficient framework for studying enzyme-surface interactions. However, the reproducible construction of immobilized systems with controlled orientations remains technically challenging, usually involving multiple computational tools. Here we present MartiniSurf, an open-source command-line framework for the preparation of protein and DNA systems immobilized on solid supports within the Martini paradigm. MartiniSurf integrates automated structure retrieval and cleaning, coarse graining via tools from the Martini force field software ecosystem, customizable surface generation, and biomolecule orientation based on user-defined anchoring residues, producing complete GROMACS-ready simulation systems. The framework supports both implicit restraint-based anchoring and explicit linker-mediated immobilization, including surfaces functionalized with user-defined ligands or linker-like moieties, enabling representation of mono- and multivalent attachment geometries at different modeling resolutions. Structure-based G[o]Martini potentials can be incorporated for proteins, while DNA systems are modeled using Martini 2. Optional substrate insertion, pre-coarse-grained complex handling, and automated solvation and ionization further extend system flexibility. By integrating these components into a unified workflow, MartiniSurf enables systematic and high-throughput in silico exploration of surface-tethered biomolecules and provides a robust computational platform for rational immobilization studies. TOC Graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=146 SRC="FIGDIR/small/714767v1_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@bc1ac4org.highwire.dtl.DTLVardef@1813b43org.highwire.dtl.DTLVardef@159b19borg.highwire.dtl.DTLVardef@19b60d6_HPS_FORMAT_FIGEXP M_FIG C_FIG
Rachman, M. M.; Iliopoulos-Tsoutsouvas, C.; Dominic Sacco, M.; Xu, X.; Wu, C.-G.; Santos, E.; Glenn, I. S.; Paris, L.; Cahill, M. K.; Ganapathy, S.; Tummino, T. A.; Moroz, Y. S.; Radchenko, D. S.; Okorie, M.; Tawfik, V. L.; Irwin, J. J.; Makriyannis, A.; Skiniotis, G.; Shoichet, B. K.
Show abstract
Cannabinoid receptors are therapeutically promising GPCRs that are also interesting test systems for structure-based methods, which have targeted them previously. Here we used the CB2 receptor as a template to explore several topical questions in library docking. Whereas an earlier campaign against the CB1 receptor led to potent but relatively non-selective ligands, here we found that targeting interactions with polar, orthosteric site residues led to subtype-selective ligands. Docking hit rate and especially hit affinity improved in moving from a 7 million to a 2.6 billion molecule library. Similar to earlier studies, docking against active and inactive states of the receptor did not reliably bias toward the discovery of agonists or inverse agonists. Cryo-EM structures of two of the new agonists, each in a different chemotype, superposed well on the docking predictions. Correspondingly, structure-based optimization led to 10- to 140-fold improvements within three different series, also consistent with well-behaved ligand families. Hit rates with a fully enumerated 2.6 billion molecule library resembled those of an implied 11 billion molecule library from a building-block method, consistent with the latters ability to explore this space, though higher affinities were discovered from the fully enumerated set. Overall, eight diverse families of ligands, with potencies <100 nM and mostly unrelated to previously known ligands were found. Implications for future studies are considered.
Wiebeler, C.; Falkner, S.; Schwierz, N.
Show abstract
Accurate ion force fields are essential for molecular dynamics simulations of biomolecular systems, particularly in combination with modern water models such as OPC. While OPC water improves the description of bulk water and biomolecules, the transferability of existing ion force fields to this model remains an open question. Here, we systematically assess the transferability of monovalent and divalent ion force field parameters (Li+, Na+, K+, Cs+, Mg2+,Ca2+, Sr2+, Ba2+, Cl- and Br-) to OPC water by comparing single-ion and ion-pairing properties with experimental data. Our analysis reveals that no single literature parameter set provides accurate results for all ions when directly transferred to OPC water. We hence introduce the MS/G-LB(OPC) force field, which combines Mamatkulov-Schwierz-Grotz cation parameters with Loche-Bonthuis anion parameters. MS/G-LB(OPC) reproduces hydration free energies, first-shell structural properties and activity derivatives at low salt concentrations. Our results demonstrate that transferring ion parameters to OPC can lead to significant and ion-specific deviations from experimental data, making careful validation essential. At the same time, the systematic transfer and combination of ion parameters from existing force fields can provide a practical and computationally efficient alternative to full reparameterization. MS/G-LB(OPC) is available at https://git.rz.uni-augsburg.de/cbio-gitpub/opc-ion-force-fields.
Otten, L.; Leung, J. M. G.; Chong, L.; Zuckerman, D. M.
Show abstract
Recently, a number of tools have been released that generate ensembles of protein structures based on artificial intelligence (AI) approaches. Although ensembles generated by the tools differ significantly, we demonstrate a computational path to harmonizing the various outputs under a stationary condition using two complementary physics-based approaches. In the first stage, the AI ensemble is used to seed a weighted ensemble (WE) simulation, promoting relaxation toward the steady state. In the second stage, trajectory segments generated by WE are reweighted to steady state using the recently developed RiteWeight (RW) algorithm. We applied this approach to generate an atomically-detailed equilibrium ensemble of unliganded adenylate kinase conformations, starting from ensembles produced by three AI tools: AFSample2, ESMFlow-PDB (trained from PDB structures), and ESMFlow-MD (trained from molecular dynamics simulation data). Dramatic differences in the AI-generated ensembles are largely erased during the WE-RW process, yielding a consistent description of the equilibrium ensemble for a given force field.
Liu, T.; Jiang, S.; Zhang, F.; Sun, K.; Head-Gordon, T.; Zhao, H.
Show abstract
Large language models (LLMs) are in the ascendancy for research in drug discovery, offering unprecedented opportunities to reshape drug research by accelerating hypothesis generation, optimizing candidate prioritization, and enabling more scalable and cost-effective drug discovery pipelines. However there is currently a lack of objective assessments of LLM performance to ascertain their advantages and limitations over traditional drug discovery platforms. To tackle this emergent problem, we have developed DrugPlayGround, a framework to evaluate and benchmark LLM performance for generating meaningful text-based descriptions of physiochemical drug characteristics, drug synergism, drug-protein interactions, and the physiological response to perturbations introduced by drug molecules. Moreover, DrugPlayGround is designed to work with domain experts to provide detailed explanations for justifying the predictions of LLMs, thereby testing LLMs for chemical and biological reasoning capabilities to push their greater use at the frontier of drug discovery at all of its stages.
Nair, V.; Niknam Hamidabad, M.; Erol, D.; Mansbach, R.
Show abstract
There has been a surge in antibiotic resistance in recent years, making traditional antibiotics less effective against key pathogens. RNA has recently emerged as a potential target for antibiotics due to its involvement in crucial microbial functions. It is possible to expand the range of therapeutic targets by using RNA-based therapies, but it remains necessary to improve the molecular-level understanding of interactions between RNA and known and potential binders. The SAM-I riboswitch, which controls the transcriptional termination of gene expression involved in sulfur metabolism in most bacteria, is an excellent ligand target. Thus, understanding its behavior with and without ligand complexes would be very helpful for drug design applications. In this manuscript, we studied the interactions between the SAM-I riboswitch and its natural ligand, SAM, which controls riboswitch function, and compared those interactions to its interactions with the very similar small molecular SAH, which does not control riboswitch function, and to its interactions with a potential binder JS4, identified via virtual screening. From our simulations, we gain a deeper understanding of small molecule interactions with the SAM-I riboswitch. The results reveal how differently the small molecules (SAM, SAH and JS4) bind to and potentially induce conformational changes in the riboswitch. Our findings offer valuable insight into the molecular mechanisms underlying riboswitch RNA-ligand interactions for the design of more effective RNA-targeting therapeutics.
Zou, R.; Nag, S.; Sousa, V.; Moren, A. F.; Toth, M.; Meynaq, Y. K.; Pedergnana, E.; Valade, A.; Mercier, J.; Vermeiren, C.; Motte, P.; Zhang, X.; Svenningsson, P.; Halldin, C.; Varrone, A.; Agren, H.
Show abstract
Synaptic vesicle glycoproteins 2 (SV2) are integral membrane proteins essential for neurotransmitter release and are implicated in neurological disorders including epilepsy and Parkinsons disease. In the attempt to develop a ligand selective for SV2C, and in collaboration with UCB, UCB-F was identified as a potential candidate. However, the affinity of UCB-F to SV2C was found to be temperature dependent, decreasing by about 10-fold from +4 to 37 degrees. UCB1A was subsequently identified as SV2C ligand displaying in vitro a 100-fold selectivity for SV2C compared with SV2A. In this study we investigated whether the binding of UCB-1A to SV2A and SV2C was affected by the temperature. A combination of experimental binding assay data and molecular dynamics (MD) simulations were used. The binding studies revealed that UCB1A affinity for SV2A decreased significantly at 37 {degrees}C compared with 4 {degrees}C, whereas binding to SV2C remained largely unchanged. MD simulations reproduced these observations, namely that ligand RMSD values at 310 K showed that UCB1A binding fluctuated markedly in the SV2A complex, with many trajectories exceeding the 3.0 [A] stability cutoff, whereas UCB1A remained relatively well-anchored in SV2C under the same conditions. Structural analysis showed that, while UCB1A adopts a conserved binding pose across all isoforms stabilized by {pi}- {pi} stacking and a hydrogen bond with Asp, SV2C possesses a unique stabilizing feature. In SV2C, Tyr298 is less exposed to the solvent and engages in a persistent hydrogen bond with Asparagine, a structural feature that reinforces pocket stability and limits temperature-induced destabilization. This interaction is absent in SV2A, consistent with its greater temperature sensitivity. Together, these findings provide a mechanistic explanation for the experimentally observed temperature independence of UCB1A binding to SV2C. More broadly, the results highlight the importance of incorporating physiologically relevant temperatures into SV2 ligand evaluation and demonstrate how combining experiments with simulations can uncover isoform-specific mechanisms of ligand recognition and stability.
Mlynsky, V.; Kuehrova, P.; Bussi, G.; Otyepka, M.; Sponer, J.; Banas, P.
Show abstract
Understanding RNA structural dynamics is essential for elucidating its biological functions, and molecular dynamics (MD) simulations provide an important atomistic complement to experimental approaches. However, the predictive power of MD is fundamentally limited by the accuracy of the underlying empirical Force Fields (FFs), particularly in capturing the delicate balance of non-bonded interactions. Here, we present a systematic reparameterization strategy that replaces the external gHBfix19 hydrogen-bond (H-bond) correction potential with an equivalent set of NBfix Lennard-Jones modifications within a state-of-the-art RNA FF. Using a quantitatively converged temperature replica-exchange MD ensemble of the GAGA tetraloop, we employed a reweighting-based optimization protocol to derive NBfix parameters that reproduce the thermodynamic effects of the original gHBfix19 terms. Sequential optimization of individual gHBfix19 components proved essential to ensure stable and transferable parameter refinement. The resulting fully reformulated NBfix-based variant, termed OL3CP-NBfix19, was validated on a representative set of RNA motifs, including tetranucleotides, A-form duplexes, and tetraloops. Across all tested systems, its performance is comparable to that of the reference gHBfix19 FF. By embedding the H-bond corrections directly into the standard non-bonded framework, the NBfix formulation eliminates external biasing potentials, simplifies practical deployment, and reduces computational overhead. Beyond this specific reparameterization, our results demonstrate a practical workflow for translating targeted H-bond corrections into native FF terms for efficient biomolecular simulations.
Zhu, Q.; Yu, H.
Show abstract
Amyloid beta (A{beta}), one of the hallmark proteins of Alzheimers Disease (AD), aggregates into plaques that are strongly linked to cognitive decline and neuronal death. Reducing its aggregation propensity may provide a strategy to slow the progression of AD. While chirality modulation has emerged as an innovative approach to disrupt this process, research has primarily focused on alterations at the C position, often overlooking the impact of the second chiral center, such as the C{beta} atom of Threonine. Furthermore, the underlying mechanisms governing these chiral effects remain elusive. Given the intrinsically disordered nature of the A{beta} peptide, we employed temperature-replica exchange molecular dynamics (T-REMD) simulations to explore its rugged conformational landscape. We considered sequence mutations (A2T, A2V), N-terminal chirality inversion of the first six residues (A2V1-6D and WT1-6D), and alteration of the second chiral center (C{beta}) of Threonine (A2TC{beta}). By analyzing the effect size and population change induced by these mutations and chiral modulation, we concluded that the modulation at the N-termini is not confined locally but also exerts specific effects on the central hydrophobic core (CHC) region. Inspection of their free energy landscape and representative structures reveals that the protective or pathogenic effects of these variants correlate with their similarity to the wild type (WT) ensemble. Beyond these static thermodynamics analyses, a direct connection to phase transitions was made by estimating heat capacity as a function of temperature. Both analyses predict that A2TC{beta} may exert a pathogenic effect, in contrast to the protective nature of A2T. These findings offer a deeper understanding of the effects of site-specific mutations and chirality and shed light on the development of advanced therapeutic strategies for AD.
Yamauchi, M.; Murata, Y.; Niina, T.; Takada, S.
Show abstract
There is a growing demand for molecular dynamics simulations to explore longer timescale behavior of giant protein-DNA complexes such as chromatin. To address this need, we extended OpenCafeMol, a GPU-accelerated residue-level coarse-grained molecular dynamics simulator originally developed for proteins and lipids, to support 3SPN.2 and 3SPN.2C DNA models. We also implemented a hydrogen-bond-type many-body potential to model DNA-protein interactions more accurately. To further improve computational efficiency, we introduced a localized scheme for calculating base-pairing and cross-stacking interactions. Benchmark tests show that OpenCafeMol on a single GPU achieves up to 200-fold speed-up for DNA-only systems and up to 100-fold speed-up for DNA-protein complexes compared to CPU-based simulations. To demonstrate the capability of our implementation for long-timescale biological processes, we simulated an archaeal SMC-ScpA complex undergoing DNA translocation via segment capture (a proposed mechanism for DNA loop extrusion) in the presence of a DNA-bound obstacle. We observed continuous captured-loop growth accompanied by obstacle bypass within the segment capture framework.
Sun, K.; Head-Gordon, T.
Show abstract
Protein kinases are critical drug targets, requiring therapeutics that can modulate their active and inactive conformational states. While cofolding models can generate global folds directly from kinase sequences and ligand SMILES strings, these models have not yet been tested on their ability to recover ligand induced-fit conformational states of the kinase proteins. Here, we introduce KinConfBench, a curated benchmark of 2,225 high-quality human kinase chains to evaluate the ability of three state-of-the-art cofolding models--Boltz-2, Chai-1, and Protenix--to recover both canonical and rare conformational states. We show that geometric success metrics of a ligand pose in the active site does not correlate strongly with the correct kinase conformational state, motivating a new set of dynamical benchmarks for assessing cofolding models. While all three cofolding models achieve [~]65-75% prediction accuracy for kinase conformational classification, they exhibit severe mode collapse when performing multiple inferences, show negligible structural diversity in sampling induced-fit motions, and display a prevalent "apo-drift" in which all three cofolding models predominately predict the kinase to be in its ligand-free state. Our results highlight that capturing ligand-induced protein conformational diversity, not just geometric fit, is critical for next-generation structure-based drug discovery.
Murcia Garcia, E.; Tian, N.; Alonso Fernandez, J. R.; Cai, X.; Yang, D.; Hernandez Morante, J. J.; Perez Sanchez, H.
Show abstract
The glucagon-like peptide-1 receptor (GLP-1R) plays a central role in metabolic regulation and is a major therapeutic target for obesity and diabetes. Peptide agonists, like semaglutide, targeting the GLP-1R remain among the most effective regulators of glucose metabolism and appetite. Nonetheless, recent reports about weight regain have limited the effectiveness of GLP1R peptide agonists, sustaining the interest in expanding the chemical diversity of GLP-1R ligands through drug discovery strategies. However, the structural complexity and conformational plasticity of class B1 GPCRs make conventional single-method virtual screening approaches prone to bias and limited chemotype recovery. Using an integrated ligand- and structure-based virtual screening pipeline, explicitly combining complementary ligand-based descriptors, multi-fingerprint similarity, electrostatic similarity, pharmacophore modeling, and multi-conformation docking under a consensus-driven selection strategy, we were able to identify three chemically distinct classes of GLP-1R agonist candidates: GQB47810, a non-peptidic molecule; neuromedin C, a peptide, and 2,5-Pen-enkephalin (DPDPE), a small peptide. From all of them, DPDPE showed the greatest effectiveness, reaching values similar to those of GLP-1, although with lower potency. Further in vitro characterization confirmed that pen-enkephalin behaved as a full agonist and exhibited dual GLP-1R/GIPR agonistic activity. These findings establish a consensus-driven and transferable computational framework for chemotype-diverse agonist discovery at conformationally flexible GPCR targets, and revealed a pentapeptide with GLP-1-like efficacy as a promising lead for next-generation small peptide therapeutics.
Upadhyay, S.; Roggia, M.; Yuan, S.; Cosconati, S.; Gabr, M.
Show abstract
Targeting protein-protein interactions (PPIs) with small molecules is historically challenging due to shallow, solvent-exposed interfaces that lack classical binding pockets. Furthermore, employing traditional structure-based virtual screening (SBVS) across ultra-large chemical spaces to find novel chemotypes imposes prohibitive computational bottlenecks. Here, we report the first prospective, real-world application of the PyRMD2Dock platform, an AI-enforced SBVS workflow that integrates machine learning and standard docking available within the PyRMD Studio suite. To target the structurally demanding immune receptor CD28, a chemically diverse subset of 2.4 million molecules from the Enamine REAL Diversity Space was docked into a cleft adjacent to the canonical ligand interface. These data were used to train 672 classification models, and the optimized model rapidly screened the remaining [~]46 million compounds. Following interaction filtering and clustering, 232 highly prioritized ligands were identified. Experimental validation of 150 purchased candidates yielded a remarkable hit rate, identifying multiple direct CD28 binders. Lead compounds 100 and 104 exhibited submicromolar affinity (Kd = 343.8 nM and 407.1 nM, respectively), potent CD28-CD80 disruption, and functional blockade in cellular reporter assays. Furthermore, these compounds successfully reduced cytokine secretion in primary human tumor-PBMC and epithelial tissue co-culture models. This study validates PyRMD2Dock as a highly scalable, effective protocol for mining massive chemical libraries to discover small-molecule modulators of challenging immune receptor interfaces.
Teshirogi, Y.; Terada, T.
Show abstract
Molecular dynamics (MD) simulations are a powerful tool for investigating biomolecular dynamics underlying biological functions. However, the accessible spatiotemporal scales of conventional all-atom simulations remain limited by high computational costs. Coarse-graining reduces these costs by decreasing the number of interaction sites and enabling longer timesteps. In extreme cases, proteins are represented as single spherical particles; while such approximations facilitate cellular-scale simulations, they often sacrifice essential structural information, such as molecular shape and interaction anisotropy. Here, we present CGRig, a rigid-body protein model with residue-level interaction sites designed for long-time, large-scale simulations. In CGRig, each protein is treated as a single rigid-body embedding residue-level interaction sites. Its translational and rotational motions are described by the overdamped Langevin equation incorporating a shape-dependent friction matrix. Intermolecular interactions are calculated using G[o]-like native contact potentials, Debye-Huckel electrostatics, and volume exclusion. We validated that CGRig accurately reproduces the translational and rotational diffusion coefficients expected from the friction matrix for an isolated protein. For dimeric systems, the model successfully maintained native complex structures. Furthermore, two initially separated proteins converged into the correct complex with an association rate consistent with all-atom simulations. Notably, CGRig achieved a simulation performance exceeding 17 s/day for a 1,024-molecule system. These results demonstrate that CGRig provides an efficient framework for simulating protein assembly while retaining residue-level interaction specificity, making it a valuable tool for investigating large-scale biomolecular self-assembly.
Cui, T.; Wang, Z.; Wang, T.
Show abstract
AI-based molecular dynamics simulation brings ab initio calculations to biomolecules in an efficient way, in which the machine learning force field (MLFF) locates at the central position by accurately predicting the molecular energies and forces. Most existing MLFFs assume localized interatomic interactions, limiting their ability to accurately model non-local interactions, which are crucial in biomolecular dynamics. In this study, we introduce ViSNet-PIMA, which efficiently learns non-local interactions by physics-informed multipole aggregator (PIMA) and accurately encodes molecular geometric information. ViSNet-PIMA outperforms all state-of-the-art MLFFs for energy and force predictions of different kinds of biomolecules and various conformations on MD22 and AIMD-Chig datasets, while adapting the PIMA blocks into other MLFFs further achieves 55.1% performance gains, demonstrating the superiority of ViSNet-PIMA and the universality of the model design. Furthermore, we propose AI2BMD-PIMA to incorporate ViSNet-PIMA into AI2BMD simulation program by introducing "Transfer Learning-Pretraining-Finetuning" scheme and replacing molecular mechanics-based non-local calculations among protein fragments with ViSNet-PIMA, which reduces AI2BMDs energy and force calculation errors by more than 50% for different protein conformations and protein folding and unfolding processes. ViSNet-PIMA advances ab initio calculation for the entire biomolecules, amplifying the application values of AI-based molecular dynamics simulations and property calculations in biochemical research.